Using Longest Common Subsequence Matching for Chinese Information Retrieval

نویسندگان

Yun Xiao

Robert Wing Pong Luk

Kam-Fai Wong

Kui-Lam Kwok

چکیده

This paper is about adopting the longest common subsequence (LCS) matching for Chinese information retrieval. We re-ranked the retrieved documents by a mixture of the original similarity score and the LCS score obtained by matching the document titles and the query. This LCS-based similarity score is also used in pseudo-relevance feedback in various ways (e.g., selecting terms and filtering documents with low LCS values). We evaluated the use of LCS in title re-ranking and PRF based on the NTCIR-4 test collection for Chinese ad hoc information retrieval. For title queries, our best MAP achieved is 26.7% evaluated using rigid relevance judgement and 30.2% evaluated using relax relevance judgement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up transposition-invariant string matching

Finding the longest common subsequence (LCS) of two given sequences A = a0a1 . . . am−1 and B = b0b1 . . . bn−1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the longest common subsequence between the sequences A + t = (a0 + t)(a1 + t) . . ....

متن کامل

Query Terms Extraction from Patent Document for Invalidity Search

This paper describes our patent retrieval system participated in the NTCIR-5 Patent Retrieval Task, Document Retrieval Subtask. The main scope of our method is the appropriate query expansion to improve recall. We extracted query terms from the topic claim, and expanded query terms extracted from sentences explained in the patent document including the topic claim. The explanation sentences wer...

متن کامل

Time-Warped Longest Common Subsequence Algorithm for Music Retrieval

Recent advances in music information retrieval have enabled users to query a database by singing or humming into a microphone. The queries are often inaccurate versions of the original songs due to singing errors and errors introduced in the music transcription process. In this paper, we present the Time-Warped Longest Common Subsequence algorithm (T-WLCS), which deals with singing errors invol...

متن کامل

Application of Natural Language Processing Tools in Stemming

In the present work an innovative attempt is being made to develop a novel conflation method that exploits the phonetic quality of words and uses some standard NLP tools like LD (Levenshtein Distance) and LCS (Longest Common Subsequence) for Stemming process. General Terms Information Retrieval (IR), Stemming.

متن کامل

Fast and Cache-Oblivious Dynamic Programming with Local Dependencies

String comparison such as sequence alignment, edit distance computation, longest common subsequence computation, and approximate string matching is a key task (and often computational bottleneck) in large-scale textual information retrieval. For instance, algorithms for sequence alignment are widely used in bioinformatics to compare DNA and protein sequences. These problems can all be solved us...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Chinese Language and Computing

دوره 15 شماره

صفحات -

تاریخ انتشار 2005

Using Longest Common Subsequence Matching for Chinese Information Retrieval

نویسندگان

چکیده

منابع مشابه

Speeding up transposition-invariant string matching

Query Terms Extraction from Patent Document for Invalidity Search

Time-Warped Longest Common Subsequence Algorithm for Music Retrieval

Application of Natural Language Processing Tools in Stemming

Fast and Cache-Oblivious Dynamic Programming with Local Dependencies

عنوان ژورنال:

اشتراک گذاری